Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
نویسنده
چکیده
With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term “prototypes” refers to the reference instances used in a nearest neighbor computation — the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. We briefly investigate how random mutation hill climbing may be applied to select features and prototypes simultaneously. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target classes.
منابع مشابه
Feature Subset Selection as Search with Probabilistic Estimates
Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading off accuracy of estimates for increased state explor...
متن کاملSubset Selection as Search with Probabilistic Estimates
Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading o accuracy of estimates for increased state explorat...
متن کاملSign Language Classification Using Search Algorithms
We concentrate on the problem of feature selection on sign language classification. The goal is to maximize the classification accuracy using a proper subset of features out of totally 22 given features. In this work, we employ two search methods: Hill Climbing approach and Random Walk approach, to select the features. We claim that both algorithms are easy-implemented but reasonable and effici...
متن کاملInduction Algorithm Induction Algorithm
Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading oo accuracy of estimates for increased state explora...
متن کاملComparison of Genetic and Hill Climbing Algorithms to Improve an Artificial Neural Networks Model for Water Consumption Prediction
No unique method has been so far specified for determining the number of neurons in hidden layers of Multi-Layer Perceptron (MLP) neural networks used for prediction. The present research is intended to optimize the number of neurons using two meta-heuristic procedures namely genetic and hill climbing algorithms. The data used in the present research for prediction are consumption data of water...
متن کامل